new method
A Computation and Communication Efficient Method for Distributed Nonconvex Problems in the Partial Participation Setting
We present a new method that includes three key components of distributed optimization and federated learning: variance reduction of stochastic gradients, partial participation, and compressed communication. We prove that the new method has optimal oracle complexity and state-of-the-art communication complexity in the partial participation setting. Regardless of the communication compression feature, our method successfully combines variance reduction and partial participation: we get the optimal oracle complexity, never need the participation of all nodes, and do not require the bounded gradients (dissimilarity) assumption.
Complexity of Highly Parallel Non-Smooth Convex Optimization
A landmark result of non-smooth convex optimization is that gradient descent is an optimal algorithm whenever the number of computed gradients is smaller than the dimension $d$. In this paper we study the extension of this result to the parallel optimization setting. Namely we consider optimization algorithms interacting with a highly parallel gradient oracle, that is one that can answer $\mathrm{poly}(d)$ gradient queries in parallel. We show that in this case gradient descent is optimal only up to $\tilde{O}(\sqrt{d})$ rounds of interactions with the oracle. The lower bound improves upon a decades old construction by Nemirovski which proves optimality only up to $d^{1/3}$ rounds (as recently observed by Balkanski and Singer), and the suboptimality of gradient descent after $\sqrt{d}$ rounds was already observed by Duchi, Bartlett and Wainwright. In the latter regime we propose a new method with improved complexity, which we conjecture to be optimal. The analysis of this new method is based upon a generalized version of the recent results on optimal acceleration for highly smooth convex optimization.
Faster Certified Symmetry Breaking Using Orders With Auxiliary Variables
Anders, Markus, Bogaerts, Bart, Bogø, Benjamin, Gontier, Arthur, Koops, Wietze, McCreesh, Ciaran, Myreen, Magnus O., Nordström, Jakob, Oertel, Andy, Rebola-Pardo, Adrian, Tan, Yong Kiam
Symmetry breaking is a crucial technique in modern combinatorial solving, but it is difficult to be sure it is implemented correctly. The most successful approach to deal with bugs is to make solvers certifying, so that they output not just a solution, but also a mathematical proof of correctness in a standard format, which can then be checked by a formally verified checker. This requires justifying symmetry reasoning within the proof, but developing efficient methods for this has remained a long-standing open challenge. A fully general approach was recently proposed by Bogaerts et al. (2023), but it relies on encoding lexicographic orders with big integers, which quickly becomes infeasible for large symmetries. In this work, we develop a method for instead encoding orders with auxiliary variables. We show that this leads to orders-of-magnitude speed-ups in both theory and practice by running experiments on proof logging and checking for SAT symmetry breaking using the state-of-the-art satsuma symmetry breaker and the VeriPB proof checking toolchain.
- Europe > Austria > Vienna (0.14)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- (7 more...)
DeepSeek may have found a new way to improve AI's ability to remember
An AI model released by the Chinese AI company DeepSeek uses new techniques that could significantly improve AI's ability to "remember." Released last week, the optical character recognition (OCR) model works by extracting text from an image and turning it into machine-readable words. This is the same technology that powers scanner apps, translation of text in photos, and many accessibility tools. OCR is already a mature field with numerous high-performing systems, and according to the paper and some early reviews, DeepSeek's new model performs on par with top models on key benchmarks. But researchers say the model's main innovation lies in how it processes information--specifically, how it stores and retrieves memories. Improving how AI models "remember" information could reduce the computing power they need to run, thus mitigating AI's large (and growing) carbon footprint.
- Asia > India (0.05)
- North America > United States > Massachusetts (0.05)
- Asia > China > Zhejiang Province > Hangzhou (0.05)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors propose a novel approach for hierarchical clustering of multivariate data. They construct cluster trees by estimating minimum volume sets using the q-One-Class SVM, and evaluate their method on a synthetic data set and two real word applications. While their new method seems to perform better than other approaches based on density estimation, I am not convinced by the benefits in practical applicability as the authors did not compare their method to the most commonly used hierarchical clustering techniques (agglomerative clustering with average linkage/ward). Minor comment: Rather than splitting their data once in a training and test set, the authors should perform 10-fold/5-fold cross-validation for a more reliable estimation of the generalizability of their method.
- Overview (0.56)
- Research Report > Promising Solution (0.35)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
In the synthetic experiments, it was chosen such that the F-score is maximized. Forcing TiWnet to produce a more sparse topology might appear logical at first, but this will equally remove false _and_ true positive edges. In TiWnet's model, feature independence is hard-coded by design, hence it is forced to explain all observations by structure only. It cannot distinguish these two sources of input.
Fix damaged art in hours with AI
In his study, Alex Kachkine, SM '23, presents a new method he's developed that involves printing the restoration on a very thin polymer film that can be carefully aligned with a painting and adhered to it or easily removed. As a demonstration, he used the method to repair a highly damaged 15th-century oil painting he owned. First he used traditional techniques to clean the painting and remove any past restoration efforts. Then he scanned the painting, including the many regions where paint had faded or cracked, and used existing algorithms to create a virtual version of what it may have looked like originally. Next, Kachkine used software he developed to create a map of regions on the original painting that require infilling, along with the exact colors needed.
Fast approximative estimation of conditional Shapley values when using a linear regression model or a polynomial regression model
We develop a new approximative estimation method for conditional Shapley values obtained using a linear regression model. We develop a new estimation method and outperform existing methodology and implementations. Compared to the sequential method in the shapr-package (i.e fit one and one model), our method runs in minutes and not in hours. Compared to the iterative method in the shapr-package, we obtain better estimates in less than or almost the same amount of time. When the number of covariates becomes too large, one can still fit thousands of regression models at once using our method. We focus on a linear regression model, but one can easily extend the method to accommodate several types of splines that can be estimated using multivariate linear regression due to linearity in the parameters.